Malcat tip: fast unpacking of RTF payloads

Sat 10 August 2024 malcat team tutorial, file format, rtf, emulation
Sample:
76dfb0e8e48b44cba5c8c74792169b58e3843809f97430e113aa137815361aa2.rtf (Bazaar, VT)
Infection chain:
RTF document -> Equation object -> Shellcode
Tools used:
Malcat
Difficulty:
Easy

Malcat is a binary analysis program. As such, it has limited support for text formats such as XML or RTF. But that doesn't mean it's completely useless for these file types. Since I'm often asked for RTF-format support, in particular for the countless RTF documents exploiting CVE-2017-11882, I'll show you a small trick to extract the payload of RTF documents using already built-in Malcat tools. This won't work with all RTFs, but should with most and it only takes a few seconds!

Some context: RTF is hard

The RTF file format is a textual document formatting specification supported by Windows and all Office versions. Not only can RTF files display text, but they are also able to embed binary objects, which has made this format popular among malware authors.

For the purpose of malware triaging, handling RTF files often means extracting the binary payload, which is the purpose of tools such as the great rtfobj from Philippe Lagadec. Nowadays, the most common binary payload one can find are equation objects embedded inside an OLE container exploiting CVE-2017-11882. But you'll also find the occasional CVE-2018-0798.

Sadly, the RTF file format is very obfuscation friendly. RTF commands, such as the objdata command used to embed binary objects, can be interrupted by arbitrary commands, even malformed ones, which makes retrieving the binary stream more difficult.

Obfuscated objdata command
Figure 2: Obfuscated objdata command

Even worse, malicious RTF documents can take advantage of known Office bugs to produce invalid RTF commands that don't respect the RTF specification but will still be interpreted correctly by Office. A great article by Kaspersky illustrates the different techniques used by attackers.

Abusing old Office bugs in RTF documents has the drawback of limiting the Office versions that can successfully open the document. But this limitation is not very important for attackers exploiting already old vulnerabilities such as CVE-2017-11882.

So at the end, even if Malcat were to implement a sound RTF parser supporting the full RTF specification, this would not guarantee that it could parse all malicious RTF documents.

Extracting the binary payload

Theory

By chance, 99% of the malicious RTF documents exploiting CVE-2017-11882 we see in the wild only use some very basic obfuscation. To unpack the binary stream with Malcat, just follow these simple steps:

  1. Locate the objdata object
  2. Select all data starting right after the objdata command up to the end of the file
  3. Transform the selection using Malcat's hex decode algorithm (check ignore non-hexa character)
  4. If it does not work (e.g. output looks gibberish), try to remove the first byte of the stream by prepending a cut X bytes transform to the chain

Adjust the number of initial bytes cut to your need, and then let Malcat do it's magic. If the payload contains an OLE document, Malcat's file carving will find it. If it's directly an OLE object, the disassembly view can help you spot the shellcode. This won't work with all RTF documents (the ones with obfuscation within the bin command), but it will work most of what you get in the wild.

Practice

Let us give it a try with the file 76dfb0e8e48b44cba5c8c74792169b58e3843809f97430e113aa137815361aa2.rtf (Bazaar, VT). This RTF file uses a little bit of obfuscation that somehow messes with rtfobj's parser:

RTFObj can't extract the embedded OLE object
Figure 3: RTFObj can't extract the embedded OLE object

So let us just locate the objdata object. You can summon Malcat's find dialog using the shortcut Ctrl+F:

Finding the objdata command
Figure 4: Finding the objdata command

Then select every bytes following the objdata object, up to the end of the file, and open the transform dialog via the menu or Ctrl+T. We'll apply the transforms as described above:

Decoding the embedded object
Figure 5: Decoding the embedded object

We see the infamous equation.3 text, meaning we are looking at the equation object, great. Click on New file and let us proceed!

Emulating the shellcode

Theory

We are now facing the Equation exploit. Well, not exactly: since we have not been very precise during our object extraction, there may be some junk bytes before and after the object. More annoyingly, we don't really know where does the shellcode of the exploit start. Fortunately, most of the exploits I have seen in the wild for CVE-2017-11882 follow the same structure, presented below:

  1. The equation header bytes, triggering the vulnerability
  2. An exploit-dependant shellcode prologue that'll fetch a few addresses from the stack/exploited program and jump to ...
  3. The shellcode decryption loop, which will decrypt and jump to ...
  4. The (encrypted) shellcode body

While steps 1 and 2 may vary depending on the exploit used, the third step is very often the same. Even better, the decryption loop is often assembled as spaghetti code, i.e. the control flow is linked by many unnecessary jumps. Thus, locating the decryption loop often means locating a sequence of adjacent jump instructions. Alternatively, you can look for other typical prologues such as push sequences or arithmetic obfuscation.

So when I want to investigate the shellcode without investing too much time, I usually:

  1. Search in the disassembly view for the decryption loop prologue
  2. Define a function at the first jump/call
  3. Emulate the shellcode
  4. If it does not work (the shellcode body is not decrypted), try with the next jump/call

Again, it won't work for all exploits, but for most of them. The goal here is to give it a quick try. For more complex cases, there is always the good old sandbox.

Practice

So first thing first, let us search for the start of the decryption loop. In this case we will look for spaghetti code immediately following data/bad code. For this, open the disassembly view using the keyboard shortcut F3 and slowly scroll down until you find something interesting.

Note that the disassembly is all grey in Malcat: we are facing an arbitrary object which is not an executable program. Malcat can't possibly know which region is code and which one is data, and thus assumes that everything is data. Data is shown greyed out in the disassembly view.

At offset 155, we find a promising pattern:

Locating the decryption loop spaghetti code
Figure 6: Locating the decryption loop spaghetti code

Now we can emulate the shellcode. First, make sure to install the speakeasy python package into your local python installation. If you are under Windows, also double-check that you have the Use system python interpreter checked in the General option tab (you'll have to install python 3.8 if necessary).

Then you'll have to tell Malcat that we have found some code. The emulation script emulates the first function it finds, but there has to be one!. To do this, simply right click on the first jump/call and select Force function start. You can also help Malcat a bit and set the architecture (in the status bar) to x86. Now let us emulate the shellcode by running a user script (Ctrl+U) and choosing emulation > Speakeasy (shellcode). If it doesn't work (i.e. the shellcode body is not decrypted), go back to the shellcode (Ctrl+W ), undefine the function (Ctrl+Z) and try with the next jump/call candidate. It usually doesn't take more than a few tries! Let us this that in action:

Emulating the decryption loop
Figure 7: Emulating the decryption loop

Congrats, you have successfully unpacked a weaponized RTF document in only a few minutes (or seconds if you're quick :)!